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Abstract 

The fast growing scale and heterogeneity of current communication networks necessitate the 
design of distributed cross-layer optimization algorithms. So far, the standard approach of dis- 
tributed cross-layer design is based on dual decomposition and the subgradient algorithm, which 
is a first-order method that has a slow convergence rate. In this paper, we focus on solving a joint 
multi-path routing and flow control (MRFC) problem by designing a new distributed Newton's 
method, which is a second-order method and enjoys a quadratic rate of convergence. The major 
challenges in developing a distributed Newton's method lie in decentralizing the computation of 
the Hessian matrix and its inverse for both the primal Newton direction and dual variable up- 
dates. By appropriately reformulating, rearranging, and exploiting the special problem structures, 
we show that it is possible to decompose such computations into source nodes and links in the 
network, thus eliminating the need for global information. Furthermore, we derive closed-form 
expressions for both the primal Newton direction and dual variable updates, thus significantly 
reducing the computational complexity. The most attractive feature of our proposed distributed 
Newton's method is that it requires almost the same scale of information exchange as in first-order 
methods, while achieving a quadratic rate of convergence as in centralized Newton methods. We 
provide extensive numerical results to demonstrate the efficacy of our proposed algorithm. Our 
work contributes to the advanced paradigm shift in cross-layer network design that is evolving 
from first-order to second-order methods. 

1 Introduction 

The scale of current communication networks has been growing rapidly in recent years as the demand 
for data access continues to increase exponentially. As a result, maintaining a centralized network 
control unit has become increasingly difficult or even undesirable in many situations, such as in multi- 
path routing and congestion control in the Internet, sensor networks, or ad hoc networks. In such 
cases, distributed algorithms are not only desirable, but also necessary. 
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In the literature, the standard distributed approach for jointly optimizing multi-path routing and 
flow control (MRFC) is based on the Lagrangian dual decomposition framework and the subgradient 
method for dual updates, where subgradients (based on first-order supporting of the dual function) are 
used as search directions (see, e.g., [1-3] and references therein). The dual decomposition framework 
and the subgradient approach are also related to the celebrated throughput-optimal "back-pressure" 
algorithm, which is the foundation of a large number of interesting routing and scheduling schemes 
(see, e.g., [4,5] and many other later works). 1 However, despite its simplicity and theoretical appeal, 
the subgradient method does not work well in practice. This is because the subgradient method, a 
first-order approach in nature, has a slow rate of convergence (and typically exhibits a zigzagging 
phenomenon if the objective function is ill-conditioned) and is very sensitive to step-size selections. 
These limitations motivate us to design a distributed Newton algorithm for the MRFC problem. The 
fundamental rationale of using a distributed Newton method is that, being a second-order approach, 
a distributed Newton method would exploit both first and second-order information (more precisely, 
the gradient and Hessian of the underlying problem) in determining search directions. As a result, a 
properly designed distributed Newton algorithm also enjoys the same quadratic rate of convergence 
as in the classical Newton type methods [6,7]. 

However, due to a number of major technical challenges, research on second-order based dis- 
tributed algorithms for network optimization is still in its infancy and results are rather limited. To 
our knowledge, only a handful of works exist in this area (see Section 2 for more detailed discussions). 
The first major technical challenge is that the computation of primal Newton direction in a second- 
order method typically requires taking the inverse of the Hessian matrix of the underlying problem (or 
solving a linear equation system), which is not always easy by itself for large-scale problems, let alone 
being done in a distributed fashion. Therefore, when designing distributed second-order algorithms, 
one needs to figure out how to decompose the inverse of Hessian matrix and distribute each piece 
to each network entity (i.e., a node or a link) in such a way that each piece can be computed using 
only local information or via a limited scale of information exchange between network entities. This 
task is not trivial except in some special problems, and certainly not in our case. The second major 
challenge is that, as we shall see later, the computation of dual variables (which represent certain 
pricing information) in a distributed second-order method also requires taking the inverse of some 
complex transformation of the Hessian matrix and needs global information. In fact, how to compute 
the dual variables in a distributed way has remained largely unaddressed until very recently, when 
some interesting ideas based on Gaussian belief propagation (to avoid direct matrix inversion [8]) 
or matrix splitting (to iteratively compute the matrix inverse [9]) were proposed for some relatively 

1 It can be shown by some appropriate scaling that the queue lengths can be interpreted as the dual variables in the 
dual decomposition framework - see Section 4 for a more detailed discussion. 
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simpler network optimization problems [10-13]. However, it remains unclear whether these ideas can 
be readily extended to more complex cross-layer optimization problems. Therefore, our goals in this 
work are centered around tackling these difficulties. The main results and contributions in our work 
are as follows: 

• We show that, by appropriately reformulating and rearranging, it is possible to expose a block 
diagonal structure in the Hessian matrix of the MRFC problem. As a result, the Hessian matrix 
can be decomposed with respect to source nodes and links in the network. Furthermore, we 
show that the inverse of each submatrix can be computed in closed-form, thus significantly 
reducing the computational complexity. This complexity reduction is made possible by a keen 
observation of the special structure of the submatrix for each network entity. 

• Based on the decomposable structure of the Hessian matrix and the special second-order prop- 
erties of the coefficient matrix of the MRFC problem, we further extend and generalize the 
matrix splitting idea of [12, 13] to the more complex MRFC problem. Also, we introduce a 
parameterized matrix splitting scheme so that the convergence performance of the iterative 
scheme for computing dual variables is tunable. 

• In addition to deriving closed-form expressions for the primal Newton direction and dual updates 
for each network entity, we also provide insights into the underlying networking interpretations 
of the proposed distributed Newton algorithm, as well as the connections to and differences from 
first-order approaches, thus further advancing our understanding of second-order approaches in 
network optimization theory. 

To our knowledge, this paper is the first work that develops a distributed Newton algorithm for 
joint multi-path routing and flow control optimization. Our work contributes to a new and exciting 
paradigm shift in cross-layer network design that is evolving from first-order to second-order methods. 
We believe that, just as the intimate connection between the subgradient-based method and the 
"back-pressure" algorithm in the first-order paradigm, an interesting second-order version of the 
"back-pressure" algorithm may soon emerge, finding its roots in our proposed distributed Newton 
algorithm. 

The remainder of this paper is organized as follows. In Section 2, we review some related work 
in the literature, putting our work in a comparative perspective. Section 3 introduces the network 
model and problem formulation. Section 4 briefly reviews the first-order decomposition approach and 
the subgradient algorithm to facilitate comparisons between first-order and second-order methods. 
Section 5 provides some preliminary knowledge of the centralized Newton method and points out the 
difficulties in distributed implementations. Section 6 is the key part of this paper, which develops the 
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principal components of our proposed distributed Newton method. Section 7 provides some relevant 
numerical results, and Section 8 concludes this paper. 

2 Related Work 

Early attempts at second-order methods for network optimization (centralized or distributed) date 
back to the 1980s [14, 15]. In [14], Bertsekas and Gafni employed a projected Newton method for 
multi-commodity flow problems. The authors adapted a conjugate gradient approach [6] such that 
computing and storing the Hessian matrix is not needed. However, a distributed implementation 
was not considered in this work. In [15], Klincewicz also proposed a distributed conjugate gradient 
direction method to solve a pure minimum cost flow routing problem where the network flows on each 
link are subject to more restrictive individual box-like constraints, as opposed to the more realistic 
sum capacity constraint over each link in this paper. It was shown in [15] that feasible conjugate 
gradient directions can be computed distributedly using information exchange along a spanning tree. 
However, the spanning tree computation still requires passing all information to a centralized node. 
We remark that these conjugate gradient algorithms belong to the class of quasi-Newton methods 
that approximate the quadratically convergent Newton method by employing the notion of conjugate 
direction (see [6] for more details). As a result, the convergence speed is relatively slower compared 
to pure Newton methods, although they could be simpler and more robust. A more recent work 
on a distributed quasi-Newton method was reported in [16], where Bolognani and Zampieri showed 
that the celebrated BFGS algorithm [6] can be decentralized to solve the optimal reactive power flow 
problem in smart-grids. 

We also note that the early attempts in [14, 15] differ fundamentally from our work in that they 
rely on projecting gradients to find feasible search directions, while our algorithm belongs to the class 
of interior-point methods, which have been shown to more efficient [17]. Indeed, most of the recent 
works in this area are based on the interior-point approach [10-13,18]. The first known interior-point 
based algorithm for a pure flow control problem (i.e., routes are fixed) was reported in [18], where 
Zymnis et al. proposed a centralized truncated-Newton primal-dual interior-point method. For the 
same problem, Bickson et al. [10,11] later developed a distributed algorithm based on Gaussian belief 
propagation technique, but without providing a provable guarantee for its convergence. On the other 
hand, Jadbabaie et al. [12] designed a distributed Newton method for solving a pure minimum cost 
routing problem (i.e., source flow rates are fixed), where a consensus-based local averaging scheme 
was used to compute the Newton direction. Although convergence of the consensus-based scheme can 
be established by using spectral graph theory [19], we note that its convergence rate could potentially 
be slow in practice. Our work is most related to [13], although [13] studied the same pure flow 
control problem as in [10, 11, 18], while we consider a more complex joint multi-path routing and flow 
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control problem. As mentioned earlier, our dual update scheme is inspired by, and is a generalization 
of the matrix-splitting technique used in [13]. Although there exists some similarity in our matrix 
splitting approach with that in [13], we point out that due to a completely different network setting, 
showing the applicability of the matrix splitting technique in our problem is not straightforward and 
the resulting distributed algorithm is completely different. Moreover, several interesting networking 
insights can be drawn from these new analyses and proofs. 

Other than employing the aforementioned second-order methods, it is worth pointing out that the 
first-order subgradient method could also be modified to somewhat mimic the behavior of second- 
order methods. Athuraliya and Low [20] developed such a scaled subgradient method for the pure 
flow control problem. Their basic idea is to use an appropriately scaled subgradient projection 
to approximate the diagonal terms of the Hessian matrix, while retaining their distributed nature. 
Although the empirical convergence rate can be improved by using this approach, it does not achieve 
the same theoretical rate gain as second-order methods. 



3 Network Model and Problem Formulation 

We first introduce notation for matrices, vectors, and complex scalars used in this paper. We 
use boldface to denote matrices and vectors. For a matrix A, A T denotes the transpose of A. 
Diag {Ai, . . . , Ajv} represents the block diagonal matrix with matrices Ai, . . . , Ajy on its main diag- 
onal, diag {A} represents the vector containing the main diagonal entries of A. We let (A)jj represent 
the entry in the i-th row and j'-th column of A. We let I denote the identity matrix with dimension 
determined from the context. A >~ represents that A is symmetric and positive definite (PD). 1 
and denote vectors whose elements are all ones and zeros, respectively, where their dimensions arc 
determined from the context. (v) m represents the m-th entry of any vector v. For a vector v and a 
matrix A, v > and A > mean that v and A are element-wise nonnegative. 

In this paper, a multi-hop network is represented by a directed graph, denoted by Q = {J\f, £}, 
where N and C are the set of nodes and links, respectively. We assume that Q is connected. The 
cardinalities of the sets M and C are \M\ = N and \C\ = L, respectively. 

We use the so-called node- arc incidence matrix (NAIM) [21] AG M. NxL to represent the network 
topology of Q. The entry in A is defined as follows: 

1, if node n is the transmitter node of link I, 
(A-) n i = < —1, if node n is the receiving node of link I, (1) 
0, otherwise. 

In the network, different source nodes send different data to their intended destination nodes 
through multi-path and multi-hop routing. Suppose that there is a total of F sessions in the network, 
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Figure 1: A network example to illustrate the structure of A and b^). 

representing F different commodities. We denote the source and destination nodes of session /, 
1 < / < F as Src(/) and Dst(/), respectively. The source flow rate of session / is denoted by a 
scalar Sf € R + . For session /, we use a source-destination vector vector by € R N to represent the 
supply-demand relationship of session /. More specifically, the entries in by are defined as follows: 



(b/)n = < 



1, if node n is a source node of session /, 
— 1, if node n is a destination node of session /, 
0, otherwise. 



(2) 



As an example, we use the 5-node 6-link network depicted in Fig. 1 to illustrate the structure of A 
and b^. This network example will also be used throughout this paper to illustrate other concepts 
and their associated networking insights. According to the definitions of A and b^), we have 
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It can be seen that a distinct feature of A and is that each column has exactly two non-zero 
entries: a "1" and 

( f) 

For every link I, we let x) > represent the flow amount of session / on link I. We assume that 
the network is a flow-balanced system, i.e., the following flow balance constraints hold at each node: 
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Figure 2: A« and are obtained by delet- Figure 3: A^ 2 ) and b^ 2 ) are obtained by delet- 
ing the destination node of f±, i.e., node 3. ing the destination node of /2, i.e., node 5. 



£ 4 f) - E A S) = Sf, ifn = Dst(/), (5) 

leT(n) leO{n) 

where O (n) and I (n) represent the sets of outgoing and incoming links at node n, respectively. We 
define = \x^\ ■ ■ ■ , x^] T G R L as the routing vector for session / across all links. Using the 
notation A, bj, and x^, the flow balance constraints above can be compactly written as 

AxW - S/ b, = 0, V/ = 1,2,...,F. (6) 

Moreover, upon taking a closer look at the linear equation system in (6), it is easy to see [21] that 
the coefficient matrix A is not full row rank (because all columns sum up to zero). To eliminate the 
redundant rows in A, we let A^) € 'M.( N ~ 1 ) xL be obtained by deleting from A the row corresponding 
to the node Dst(/). It is easy to verify that is of full row rank [21]. Also, we let € M^ -1 
be obtained by deleting from the entry corresponding to the node Dst(/). For example, for the 
network in Fig. 1, by deleting the third and fifth rows (i.e., corresponding to deleting the nodes 3 
and 5, see Figs. 2 and 3), we can write A^, A^ 2 ), and b^ 2 ) as follows: 
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We assume that each link in the network is capacitated. The capacity of link I, denoted by C/, 
is assume to be fixed, which models conventional wireline networks or wireless networks with static 
channel and fixed transmit power. Since the total network flow traversing a link cannot exceed the 
link's capacity limit, we have 

F 

£> (/) <Q, VZ = 1,...,L. 
/=i 

We associate a utility function Uf : M+ — > M with each session /, i.e., Uf(sf) denotes the utility 
of session / as a function of the session rate sj. We assume that the utility functions are additive so 
that the overall network utility is given by ^2j = \ Uf(sf). We also assume that the utility functions Uf 
are strictly concave, monotonically increasing, twice continuously differentiable, and reversely self- 
concordant (see [7] for the definition of self-concordance). Our objective is to maximize the sum of 
utilities of all sessions. Putting together the routing and flow control constraints described earlier, we 
can formulate the joint multi-hop routing and flow control optimization (MRFC) problem as follows: 

MRFC: 

F 

Maximize Uf(sf) 
/=i 

subject to A^xW-s/b^^O, V/ = 1,...,F 

F 

J>, (/) <Ci. VZ = 1,...,L 

/=i 

xJ /} >0, V/,/; S/ >0, V/. 

Note that, in MRFC, the objective function is concave and all constraints are linear. Hence, this 
problem is a convex program and can be solved by using standard convex programming methods. 
Moreover, due to a decomposable structure in the dual domain, it is well-known that the MRFC 
problem can also be solved in a distributed fashion based on a Lagrangian dual decomposition and 
subgradient optimization framework. In the next section, we will briefly review the dual subgradient 
method, which will later be compared with our proposed Newton method. 

4 Dual Subgradient Method for Solving MRFC: A Quick Overview 

Since MRFC is a linearly constrained convex program, it can be equivalently solved in its dual domain 
because of a zero duality gap. To solve the MRFC problem in its dual domain, we first slightly modify 
the first constraint in MRFC as an inequality constraint A^'x^ — Sfb^ > 0. This modification 
does not affect the solution at optimality and can be interpreted from a network stability perspective 
(i.e., total service rate at each node is no less than the total arrival rate). Then, by associating a dual 
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(f) 

variable kJi > for all n, f and rearranging terms in the Lagrangian, it can be shown that the dual 
function can be written as 

en = f> c (42 (/) ) +£ e *KV «£ w )> 
f=i 1=1 

where Opc(^grc(/)) and @r(^ m Xx\o' u Rx(o) are res P ec ti ver y corresponding to the flow-control sub- 
problem at node Src(/) (transport layer) and the routing subproblem at each link / (network layer): 

@fc(42(/)) =^{Uf(sf)-ugl {fff \s f >0}, (7) 



©r(4VA)) " max { E - ^(O^V > O^nd^^ < C, V/}. (8) 

/=i /=i 

The dual problem can be written as 

Minimize 0(u) , s 

(9) 

subject to u > 0. 

Due to this separable structure, the dual function 6(u) can be evaluated by computing 0pc (""sr^/)) 

and Br ^xi(i) ' u r1(i) ) ^ or eacn source node and each link, respectively. The optimal dual variables 
u* can be iteratively computed by using the subgradient method as follows: 

4% + l) = max{4%)-A4%),0}, Vn,/, (10) 

where 7r k > is a step size chosen at the A;-th iteration and dn\k) is a subgradient at the A;-th 
iteration, which can be computed as 



Eieo(n) 4 f \k) - Eiex(n) x^k), if n + Src(/), 
lE/ e0 (n)^ (/) W-^W> ifn = Src(/). 

It can be seen from (11) that the subgradient d)i (k) can also be computed at each node in a 
decentralized fashion. 

There are several interesting networking insights in the subgradient-based first-order method. 

if) ■ 

First, the dual variables jj„ can be interpreted as the price charged to session / by node n. For 

example, if the subgradient component d\i (k) = J2ieO(n) x i (k) — Yliextn) x i (^) < i.e., the 

( f) 

stability condition is violated, then the price Wn will increase in the (k + l)-st iteration, thus 
discouraging session / from passing through node n. Second, it can be seen that by dividing the step 
size 7r k on both sides of (10) and letting Qn\k) = UrP /ir k , we have Qn\k + 1) = max{<3^(fc) - 
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Ei 6 o(„) A f) (*) + E Je x(„) ^ (/) (fc) , 0} if n ^ Src(/) or q1 /} (fc + 1) = max{Q^P (k) - £, e0(n) (A;) + 
s/,0} if n = Src(/). This is exactly the queue length evolution of session / at node n. Thus, the 
dual variables and queue lengths are intimately related (differ by a scaling factor). Lastly, it is easy 
to see that the knapsack-type subproblem @r ^Tx(z) ' n Rx(/)) a dmfts a trivial solution: for the given 
link I, pick a session that has the largest (it^i^) — -u^^) -value quantity, say /*, and let /* use up the 
link capacity C\. This is exactly the same strategy used in the celebrated "back-pressure" algorithm 
that was first discovered in [4], even though its throughput-optimality was established using tools in 
control theory. 

However, despite its simplicity and interesting networking interpenetrations, the subgradient 
method usually does not work well in practice due to its slow rate of convergence and sensitivity 
to step-size selection. 



5 Centralized Newton Method: A Primer 

In this section, we provide some preliminary discussion on using the conventional Newton method 
to solve Problem MRFC, along with a problem reformulation and an analysis of the challenges in 
developing distributed algorithms. 



5.1 Problem Reformulation 

To facilitate the development of a distributed Newton method for solving MRFC, we need to refor- 
mulate MRFC into a form that only has equality constraints so that the Newton method can be 
readily applied. Following the standard approach used for interior point methods [17], we employ a 
logarithmic barrier function to represent each inequality constraint including non-negativity restric- 
tions, and we accommodate this within the objective function. The augmented objective function is 
rewritten as follows: 



L F 



/(y) = -tjz Uf(sf) - E log (d - E x\ f) ) - E iog(«/) -EE lo ^i 
f=l 1=1 v f=l / f=l 1=1 f=l 



(/)> 



where y = si • • • sp, (xS ') 



(i)yr ... 



x 



r 



6 and where t > is a parameter that permits 



us to track the central path in the interior point method as t — > oo [7]. Furthermore, we denote 
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b (i) 



b(^) 



_ A (i) 



■AW 



-l)Fx(L+l)F_ Clearl y ; M is of f ull row rank 



because of its block diagonal structure with each block being of full row rank. With this notation, 
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we can rewrite the revised Problem MRFC as follows: 

R-MRFC: 

Minimize /(y) (12) 
subject to My = 0. 

Observe that the approximation accuracy of R-MRFC can be controlled by t: as t increases, the 
first term in /(y) dominates the barrier functions and R-MRFC becomes a better approximation of 
MRFC. In fact, it can be shown that the approximation error is bounded by K/t [7], where K is 
some constant depending on problem specifics. In other words, as t — >■ oo, the solution of R-MRFC 
converges to that for the original problem. However, one may suspect that, as t increases, solving 
R-MRFC may become more difficult due to numerical instability. Interestingly, this is not the case 
as long as /(y) satisfies the so-called self-concordance conditions, which has been guaranteed because 
—[//(•) and all barrier functions are self-concordant (we refer readers to [7,17] for more detailed 
discussions on self-concordance). 

5.2 Centralized Newton Method for the Reformulated Problem 

Starting from an initial feasible solution y°, the centralized Newton method searches for an optimal 
solution using the following iterative rule: 

yfc+^yfc + ^Ay*, (13) 

where 7r fc > is a positive step-size. In (13), Ay fc denotes the Newton direction, which is the solution 
to the following linear equation system (obtained by deriving the Karash-Kuhn- Tucker (KKT) system 
of the second-order approximation of /(y) [6,7]): 



" H fe M T ' 




' Ay fe " 




' V/(y fe ) " 


M 




w fe 








(14) 



where H fc = V 2 /(y fe ) G M.( L + 1 ) Fx ( L + 1 ) F [ s the Hessian matrix of f(y) at y k , and the vector w fc G 
^(N-i)F con tains the dual variables for the flow balance constraints My = at the k-th iteration. 

It can be easily verified that the coefficient matrix of the linear equation in (14) is nonsingular. 
Therefore, the primal direction Ay k and the dual variables w fe can be uniquely determined by solving 
(14). However, solving Ay fc and w fc simultaneously via (14) requires global information. A key step 
towards solving (14) in a decentralized manner is to rewrite it as follows: 

Ay fe = -H- 1 (V/(y fe ) + M T w fc ), (15) 
(MH^ 1 M T )w fc = -MH fc 1 V/(y fc ). (16) 
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Thus, given y k , we can solve for w fc from (16), and hence, solve for Ay k from (15), which can thus 
be used in (13) along with an appropriate step-size ir k . However, as we shall see in Section 6, the 
inverses of H fc and (MH^M 7 ) still require global information. This is unlike the pure flow control 
problem in [13] and the pure minimum cost routing problem in [12], where the Hessian matrix 
is diagonal and can be readily computed using local information. Therefore, in the next section, our 
goal is to further reformulate R-MRFC so that we can design an iterative scheme to compute Ay fc 
and w fc in a distributed fashion. 



6 Distributed Newton Method 

In this section, we will present the key components of our proposed distributed Newton method. We 
will first further reformulate Problem R-MRFC in Section 6.1, following which we will introduce some 
basic second-order structural properties of the reformulated problem in Section 6.2. The distributed 
computations of the primal Newton direction and the dual variables will be presented in Sections 6.3 
and 6.4, respectively. Finally, in Section 6.5, we shall discuss some other implementation issues (i.e., 
information exchange mechanism, initialization, stopping criterion, and step-size selection). 

6.1 Problem Rearrangement 

The first major hurdle in decentralizing the Newton method is the coupled structure in the Hessian 
matrix of the MRFC problem. To see this, we start by evaluating the first and second partial 
derivatives of /(y). Noting that /(y) is separable with respect to each flow / and link I, the only 
non-zero partial derivatives are: 

df(y k ) , 1 df(y k ) 1 1 
— = -tUAs f ) , V/, nr = ^ (Jr) — m ' 

ds f r s f dx\ f) Ci-Yf fl=1 x\ n x\ f) 

d 2 f{y k ) „ 1 d 2 f{y k ) 1 1 

- -tU'Usf) + — , V/, = — — jj^— + — 77T-, V/,/, 



d 2 f{y k ) 1 



dxMdz™ (Cj -£/'=i x\ n ' 



2 



V/,/i// 2 . 
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For convenience, we use Si = Ci — J2f=i x i^ to represent the unused link capacity of link I, which will 
occur frequently in the rest of the paper. We also define three types of diagonal matrices: 



D = Diag {-tU'j(s f ) + ^ , / = 1, . . . , f| , 



1 



1 



D ^= Diag 1^ + ^ 2 ' 



1 = 1,. ..,L\, 



D 2 = Diag {-p,l = l,- 



Then, it can be verified that has the following structure: 
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H fe = V 2 jV) = 
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D 2 
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D 2 
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• Df) 



Due to this non-diagonal and coupled structure with respect to the different sessions, H^ 1 cannot be 
computed separably for each source node and each link. Fortunately, this problem can be tackled by 
reformulating R-MRFC as described next. 

First, from the partial derivatives computation, we note that /(y) is separable with respect to 
each link. This prompts us to rearrange y and M based on links as follows: 
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I (1) 

sf \x\ ■ ■ ■ x\ 



(1) 



T 



eRl i+1 ' F , and 
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B Ai 



where 



b(D 



b(^) 



and = 



—a 



(i) 
i 



, I — 1, . . . , L, 



and where in the definition of A;, the vector a|^ is the Z-th column in the matrix A^, i.e., A*^ = 

(/) .-,(/) . . . „(/) TTr»r ovamnle fr*r fVm n £>+wn-rl^ in TTirr 1 Qln d £L^^ Can be Written aS 



, for the network in Fi£ 


?■ 1, 


a (1) 




h 




k 


m 




n\ 





= n 2 


-I, af = 


n2 


1 







™3 





n 5 





ri4 
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As a result, R-MRFC can be equivalently re-written as follows: 



R2-MRFC: 

Minimize /(y) 
subject to My = 0. 



(17) 



By the same token as in the previous section, the Newton direction of R2-MRFC is the solution to 
the following linear equation system: 



(18) 



where w fc represents the dual variables for the flow balance constraint My = 0. Here, the entries in 
w k are arranged as [(wj^) T , . . . (wj^) T ] r , where wjj^ is in the form of 



Hfc 


M T " 




' Ay fc " 




' V/(y fc ) " 


M 







w fe 








#(/) a r~(/) ~lf) -(/) -if)]? 

w fc - [w-l , . . . , w Dst(/) _ 1 , ™ Ds t(/)+l ' • • • ' W N J 



iN-1 



(19) 



Note that in (19), we have dropped the iteration index k within [-]for notational simplicity. For the 

same reason, in the rest of the paper, the iteration index k will be dropped whenever such an omission 

~( f) 

does not cause confusion. Also, we let w^stf/) = ^> ^ or a ^ f ' ^ s we snan see ia t er ) this helps simplify 
the closed-form expressions in Theorem 6.7. More detailed discussions on the physical meaning of 
the dual variables w fc will also be provided in Section 6.3. 

6.2 Basic Second-Order Properties of and 

In formulating Problem R2-MRFC, we have introduced two new vectors, namely, and b^. Here, 
we will first study some of their basic second-order properties, which will be used extensively later in 
designing a distributed Newton method. Most of these properties can be verified using simple matrix 
computations based on the definitions of a>^ and b^. Thus, we omit the formal proofs of these 
properties. First, we have the following basic second-order property for b(/): 

Lemma 6.1. The rank-one matrix b^(b^) T has the following structure: 

1> if i = j o,nd (hW)i corresponds to Tx(Z). 



(b(/)(b(/)f) 



0, otherwise. 



(20) 



If) 



For a.y ' , we have the following two second-order properties. 
Lemma 6.2. The rank-one matrix a|^(a|^) T has the following structure: 
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Case 1: If none of link I's two end points is the destination node of flow f, i.e., Tx(Z),Rx(Z) ^ 
Dst(/), then &\ f \&f ) ) T has four non-zero entries, where 



1) if i = j > ( a ; )« corresponds to Tx(Z) orRx(Z), 

( f) ( f) 

— 1, if i ^ j, (a] )i corresponds to Tx(Z) and (a] , )j corresponds to Rx(Z), 

or (aj | )j corresponds to Tx(Z) and (aj , )j corresponds to Rx(Z), 

0, otherwise; 



Case 2: If link I's receiving node is the destination node of flow f, i.e., Rx(Z) = Dst(/) then 
a^(a^) T has one non-zero element, where 



(aj»(a«f).. 



( f) 

1) if i =3 md (aj ) \ corresponds to node Tx(/), 
0, otherwise. 

(/i)/„(/ 2 )xT 



(21) 



Lemma 6.3. T/ie rank-one matrix a ; (a ; ) /ias £/ie following structure: 

• Case i: // none of link I's two end points is the destination node of either flow f\ or flow f2, 
i.e., Tx(Z),Rx(Z) + Dst(/i) and Tx(/),Rx(/) ^ Dst(/ 2 ), tfien a/ (/l) (a{ /2) ) T /ias /onr non-zero 
entries, where 

1, i/ (a|^)i ana 1 (a^ 2 - 1 ).,- &o£/i correspond to Tx(/), 
or (a ; 1 )j and (aj f )i &oi/i correspond to Tx(Z), 
— 1, z/ (aj r )i corresponds to Tx(Z) and (aj ; )j corresponds to Rx(Z), 
or (a^^)j corresponds to Tx(Z) and (ap^)j corresponds to Rx(Z), 
0, otherwise; 



• Case 2: If link I's receiving node is the destination node of either flow f\ or flow $2, i.e., 
Rx(Z) = Dst(/i) or Rx(/) = Dst(/2), then a^(a|"^) T has two non-zero elements, where 



(a<«(, 



1, if and (a.^ 2 ^)j both correspond to Tx(/), 

— 1, if (aj )j corresponds to Tx(Z) and (aj ; )j corresponds to Rx(Z), 
or (a^ 1 ^ corresponds to Rx(/) and (a^)j corresponds to Tx(/), 
0, otherwise. 



• Case 3: If the two end nodes of link I are respectively the destination nodes of flow f\ and flow 
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ji, then ap^(ap^) T has one non-zero element, where 



-1, */(*! 



,(/ 2 ) 

0, otherwise. 
For example, for the network in Fig. 1, we have 



corresponds to Tx(i) and (aj )-,- corresponds to Rx(Z), 

(/ah 



or (a )i corresponds to Rx(/) and (a f )_j corresponds to Tx(Z), 





m 


n 2 


n4 


n 5 


ni 


© 





(d) 





™2 
















© 





© 





n 5 















where Tx(Zi) = ni and Rx(Zi) = n&. It can be seen that there are four non-zero entries in aj 1 ^(aj 1 ^) T . 
As stated in Case 1 of Lemma 6.2, the entries on the main diagonal corresponding to n\ and are 
equal to 1. Also, the off-diagonal entries that correspond to n\ and are —1. 
On the other hand, it can be verified that 





ni 


n 2 


n4 


n 5 


m 



















© 






















n 5 















4"(4") T - 



where Tx(Zs) = n2 and Rx(Zs) = n3 = Dst(/i). It can be seen that there is only one non-zero entry 
in (&ip) T . As stated in Case 2 of Lemma 6.2, the entry on the main diagonal corresponding to 
n2 is equal to 1 and all other entries are zeros. 
Also, we can verify that 





m 


n 2 


ri3 


n4 


m 


© 








(© 


n 2 














n4 


© 








© 


n 5 















where Tx(/i) = n\ and Rx(Zi) = and neither n\ nor n4 is equal to Dst(/i) or Dst(/2). In this case, 
it can be seen that there are four non-zero entries in a^(a^) T . As stated in Case 1 of Lemma 6.3, 
the entry whose row and column correspond to n\ is equal to 1 (the same is true for 714). Also, the 
two entries whose row and column respectively correspond to n\ and n4 are equal to —1. 
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On the other hand, it can be verified that 





ni 


"-2 


n 3 


n 4 


ni 



















® 






















n 5 





© 









where Tx(7«4) = n 2 and Rx(Lt) = n§ = Dst(/2). It can be seen that there are only two non-zero entries 
in &1\&^) T . As stated in Case 2 of Lemma 6.3, the entry whose row and column correspond to n 2 
is equal to 1 . Also, the entry whose row and column respectively correspond to n§ and n 2 is equal 
to 1. 

Finally, it can be verified that 





n x 


n 2 


™3 


n 4 


ni 














n 2 














714 














n 5 








© 






where Tx(^) = «5 = Dst(/2) and Rk(Iq) = = Dst(/i). Thus, as stated in Case 3 of Lemma 6.3, 
we have that the entry whose row and column respectively correspond to and is equal to —1. 



6.3 Distributed Computation of the Primal Newton Direction 

By solving (18), we have 

Ay* = -Hfc 1 (V/(y fc )+M T w fc ), 
(MH fc 1 M T )w fe = -MH^V/^). 



(22) 
(23) 



Now, consider the Hessian matrix of Problem R2-MRFC. As mentioned earlier, since f{y k ) is 
separable based on links, the Hessian matrix for the rearranged Problem R2-MRFC has the 
following block diagonal structure: 

H fc = Diag {S, Xx, . . . , X L } G R (Wx(L+i)F > 

where S is a diagonal matrix defined as S = Diag ■! —tU'{{s{) + ©,..., —tU'lisp) + © \ & R FxF ; 
and where X; € M© xF is a symmetric matrix with entries defined as follows: 



( X 0/l,/2 = < 



(,<*>) 



1 

{6? 



7 if h = h, 
if fi + h- 



(24) 
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It then follows from the block diagonal structure of Hfc that 

H-^DiagjS-Sxr 1 ,...^ 1 }. 

Noting that S _1 is diagonal, we have S _1 = Diag < 1 t , . . . , 1 1 >. Notice further 

—W 1 (si) + -z —tU F (s F ) + ^ r 

(. s l a F ) 

that each source node / has the knowledge of Sf, which implies the following result: 

Lemma 6.4 (Distributedness in computing S _1 ). The computation of S _1 does not require global 
information and can be computed source node-wise in a distributed fashion. Moreover, the inverse of 

S is given by S 1 = Diag ' — — - — v 



tu['( si )+^ ' • • • ' -tU F (s F )+- 



"F 



Next, we consider the computation of X^ 1 . Note that X; only involves variables xf\ f = 1, . . . , F, 
which are available at each link locally. Hence, we have the following result: 

Lemma 6.5 (Distributedness of computing X ; _1 ). The computation ofXf 1 does not require global 
information and can be computed link-wise in a distributed fashion. 

r ~\T 

For convenience, we define a new vector % = . . . , xf\ 5\ € Note that this (F + l)- 

dimensional vector only has F degrees-of-freedom (DoF) and its Li-norm (noting positive components 
due to the barrier function) is a constant C\. Then, by exploiting the special structure of X; in (24), 
we can show that X z _1 can be computed in closed-form as follows: 

Theorem 6.6 (Closed-form expression for X^ 1 ). The entries o/X^ 1 can be computed in closed-form 
as follows: 



( X 2 1 )fif2 = < 



V „ l|x '" / (25) 



_(/l)_(/2)V 

' } ifl<f u f 2 <F, h + h- 



ll x il 

The basic idea of the proof of Theorem 6.6 is based on a keen observation of the decomposable 
structure of X; in (24) and the Sherman-Morrison- Woodbury formula [6] . We relegate the details of 
the proof to Appendix A. 

Combining Lemma 6.4, Lemma 6.5, Theorem 6.6 and all related discussions earlier, we can con- 
clude that H^T 1 , the inverse of the Hessian matrix of Problem R2-MRFC, can be computed distribut- 
edly as shown in the following theorem: 

~ (f) 
Theorem 6.7. Given dual variables w, the Newton direction Asj and Axf for each source rate 

( f) 

Sf and link flow rate x\ can be computed using local information at each source node s and link I, 
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( f) 

respectively. More specifically, Asf and Ax\ can be computed as follows: 



8f(tS f U' f (8f) + l-S f wW ) 



As f = 



Ax 



(/) _ (l ( fl) 2 



1 - tsjU'>(s f ) 

if)\2 



1 



X 



l X /| 




x 



C/'h2 



l x z| 



(/) J, + W Tx(0 W Rx(0 I + 



m - 7" + - w - f ' J 



V/, 



(26) 



V/,/. (27) 



The key steps of proving Theorem 6.7 are: (i) applying H^ 1 = Diag {S -1 , Xj -1 , . . . , X^ 1 }, 
Lemma 6.4, Lemma 6.5, and Theorem 6.6 in (22), and (ii) exploiting the special structure of and 
b^ to simplify the result. We relegate the proof details to Appendix B. 

Remark 1. An important remark for Theorem 6.7 is in order. Besides providing a closed-form 
expression for a distributed primal Newton direction computation, it also provides an interesting 
networking interpretation. Here, we can think of the difference of the dual variables (Wpx(z) ~~ ^Rx(z) ) 



in (27) as being similar to the queue length difference in the "back-pressure" algorithm, although 
Wn cannot be exactly interpreted as queue length (since it can be positive or negative). Note that 



and 



(*, (/) ) 2 



-If) 



are all positive quantities. Hence, if the positive (^ x (i) 



^i2(o)- value 



in (27), 1 

outweigh the negative ones, i.e., the "pressure" on the transmitter side of link I is greater than the 

( f) 

"pressure" on the receiver side, then x\ will be increased in the next iteration. Note also that, 

(f) 

unlike in first-order methods, the decision to increase or decrease xf at link I considers not only the 
"pressure difference" of flow / but also the "pressure difference" from other flows at link I (via an 
appropriate weighting scheme as evident in (27)). 



6.4 Distributed Computation of the Dual Variables 

As mentioned earlier, given a primal solution y fe at the k-th iteration, the dual variables w fc may be 
computed using (23). However, solving for w fc using (23) cannot be implemented in a distributed 
fashion because computing the inverse of the matrix MH^M 7 requires global information. In what 
follows, we will first study the special structure of MH^M 7 . Then, we will show that the matrix 
splitting scheme in [13] can be generalized to compute the dual variables w fc as per (16). 



Recall that M can be written in a partitioned matrix form as M 



B Ai 



. Hence, 
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we can decompose MH fc 1 M T as 



MH^M 1 " 



B Ai A, 



xr 1 



X 



-1 



BS- 1 B T + J]A z X- 1 Af. 



1=1 



(28) 

Now, we consider each term in the decomposition in (28). For BS X B T , since B and S _1 are diagonal, 
we have 



BS : B T = Diag 



— b (1) (b«f,..., 



-tUUsF) + j^jj 



— b^b^ , (29) 



which is a block diagonal matrix. Moreover, by Lemma 6.1, each bock has the following structure: 

1 



— Diag{0...1...0}, 

-tU f (s f ) + jjjp 

where the position of the only non-zero entry 1 corresponds to node Src(/). 

Next, consider the term AjX f _1 A^, which is more involved. From Theorem 6.6, we can 

decompose Ya=i A/X^ -1 A^ as follows: 



i=i 



i=i 



„(1)A2 



\ 



. T 



i=i 



V 



(^4 



(30) 
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Due to the block diagonal structure, the first term in (30) can be further written as 

£ (A,Diag {(^f, . . . , (^) 2 } Af) = ^Diag {( X ^\^f , . . . , (^f^^f} , 
i=i i=i 

which is also a block diagonal matrix. Thus, we can combine this term with BS _1 B T . For con- 
venience, we let D = BS ^B T + Yd=i A* Diag {(x, (1) ) 2 , • • • , (xJ F) ) 2 } Af . Clearly, D is also block 
diagonal, and can be written as D = Diag {Di, . . . , Dp }. Then, by using Lemma 6.2, we obtain the 
following result, where the proof is relegated to Appendix C. 

Lemma 6.8. The matrix D is block diagonal and each block Dy on the main diagonal has the 
following structure: 
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The diagonal entries (Df)u are given by 
(Df)u = < 



S«eO(n)UX(n)( X F^) 2 + 



if row i corresponds to node n and n = Src(/), 



XieO(n)ux(nM f) ) 2 otherwise 
The off-diagonal entries of (Df)ij, i ^ j, are given by 



X^er(ni n 2 )( x \^) 2 */ row * an d c °l umn j correspond to two connected nodes n\ and 



(Pfh 



otherwise, 
where T{n\,n2) = {/ G C : Tx(/) = n\ and Rx(/) = n 2 , or Tx(Z) = n 2 and Rx(/) = n\}. 

Next, we study the second term in (30), denoted as C, which is symmetric and can be shown to 
have the following partitioned structure using Theorem 6.6: 



i=i 



i 



i x «i 



V 



(lh4 



where 



G21 D2 
Gfi Gf2 



L < (/)\4 

1=1 l|Xi|1 



■ ■ Gif 

■ ■ G2F 



D, 



(4 F) ) 4 



G 



/1/2 



E 



f (/l) (/2h2 



l x zl 



f = l,...,F, 

f 1 ,f 2 = l,...,F,f 1 ^f 2 . 



Noting the similarity between D/ and Dj, and by using Lemma 6.2 and following a similar derivation 
to that for Lemma 6.8, we obtain the following result for characterizing Dj, where we omit the proof 
to avoid repetition. 

Lemma 6.9. The matrix Dj has the following structure: 
• The diagonal entries (Df)a are given by 



(Of), 



E 



leO(n)Ul(n) 



Sf)\4 



21 



The off-diagonal entries of (Df)ij, i ^ j, are given by 



(Pfh 



E 



K (/) ) 4 



Zer(ni,n 2 ) ||x ( | 







if row i and column j correspond to two connected nodes n±,n2, 
otherwise. 



Using Lemma 6.3, we can also characterize the structure of Gf ± f 2 as stated in Lemma 6.10 below, 
where the proof follows that of Lemma 6.8 and is therefore omitted for the sake of brevity. 

Lemma 6.10. The matrix Gjjj 2 has the following structure: 



( G /i/ ; 



2)13 



S/eO(n)UX(n) 
— YlleT(n 1: n 2 ) 





P7TP 



if row i and column j correspond to the same node n, 

if row i and column j correspond to two connected nodes ni,ri2, 

otherwise. 



So far, we have characterized the structures of Dj and Gf x f 2 . Hence, the structure of C is also 
known. Finally, recall that MH^M 7 = D — C. Therefore, combining the previous derivations, we 
have the following result for the structure property of MH^M 1 *. 

Theorem 6.11. The matrix MH t _1 M T can be written as the following partitioned matrix: 



MH" 1 M T = 



Di-Di -G12 
— G21 D2 ~~ L>2 



—Gfi 



— Gf2 



— G\f 

—G2F 

Dp — D; 



where the structures properties of the matrices Dj ; Y)f, and Gf x f 2 are specified in Lemma 6.8, 
Lemma 6.9, and Lemma 6.10, respectively. 

Armed with Theorem 6.11, we are now in a position to design a distributed iterative scheme to 
compute the dual variables using matrix splitting techniques. To this end, we first introduce some 
preliminary results in matrix splitting theory. 

Historically, the idea of matrix splitting has its origin in designing iterative schemes to solve linear 
equation systems [9]. Consider a consistent linear equation system Fz = d, where F G M nxn is a 
nonsingular matrix and z,d € W 1 . Now, suppose that F is split into a nonsingular matrix Fi and 
another matrix F2 according to F = Fi — F2. Also, let z° be an arbitrary starting vector. Then, a 
sequence of approximate solutions can be generated by using the following iterative scheme: 



5 fc + x = (Ff 1 F 2 )z fe + Ff 1 d, k > 0. 



(31) 
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Generally, Fi should be an easily invertible matrix (e.g., diagonal, etc). It can be shown that this 
iterative method is convergent to the unique solution z = F _1 b if and only if the spectral radius of 
the matrix F 1 ~ 1 F2 is less than one, i.e., p(F^ 1 F2) < 1, where p(-) represents the spectral radius of 
a matrix. The following result provides a sufficient condition for p(F 1 ' 1 F2) < 1 (see [9, 13] for more 
details). 

Lemma 6.12. Suppose that F is a real symmetric matrix. If both matrices F\ + F2 and Fi — F2 are 
positive definite, then p(F^ 1 F2) < 1. 

Lemma 6.12 suggests that the convergence property of a given matrix splitting scheme can be 
verified by checking for the positive definiteness of the identified matrix. The following result provides 
a sufficient condition for checking positive definiteness based on diagonal dominance [22, Corollary 
7.2.3]: 

Lemma 6.13. If a symmetric matrix Q is strictly diagonally dominant, i.e., \(Q)u\ > J2j^i \(Q)ij\> 
and if (Q)u > for all i, then Q is positive definite. 

We are now ready to use the matrix splitting scheme in (31) to compute w fc . First, we let A be 
the diagonal matrix having the same main diagonal of MH^ 1 M T , i.e., 

A fc = Diag {diag {mH^M t } } . (32) 

We let fi denote the matrix containing the remaining entries after subtracting A/% from MH^M 7 , 
i.e., 

n k = MH" 1 ^ - A fe . (33) 
Further, we define a diagonal matrix fife where the diagonal entries are defined by 

(n fc )ii = X;i(n fc )i J -|. (34) 

Then, we can split MH^ 1 M T as (Afe + afifc) — (aftf- — fik), where a > \ is a parameter that serves 
the purpose of tuning convergence performance. Based on this splitting scheme, we have the following 
result: 

Theorem 6.14. Consider the matrix splitting scheme MH^M 1, as MH^M T = (A& + afi&) — 
(aflk — fife), where A^, fife, and fife are defined in (32), (33), and (34), respectively. Then, the 
following sequence {w fe } generated by 

w fc+1 = (A fc + alife)- 1 (arifc - fife)w fc + (A fc + an^^-MH^V/ljf*)) (35) 

converges to the solution of (23) as k — > 00. 
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By Lemmas 6.12 and 6.13, the key to proving Theorem 6.14 is to verify that both the sum and 
difference of the two components in the splitting scheme are strictly diagonally dominant. We relegate 
the proof details to Appendix D. 

Remark 2. The matrix splitting scheme in Theorem 6.14 is inspired by, and is a generalization of, 
the scheme in [13]. The goal of both splitting schemes is to construct a diagonal nonsingular matrix 
(Afe + ctfifc in our paper) for which the inverse can be separated and easily computed by each node (as 
in our case) or each link (as in [13]). However, our matrix splitting scheme differs from that in [13] 
in the following aspects. First, since (A*. + aflk) is not element-wise non-negative (c.f. [13]), the 
definition of the matrix in this work is different from that in [13], which also leads to a different 
proof. Second, we parameterize the splitting scheme (using a) to allow for tuning the convergence 
speed in (35), where the scheme in [13] is a special case of our scheme when a = 1. 

Some comments on the parameter a are addressed at this point. From (31), it can be seen that 
the solution error decreases in magnitude approximately by a factor of p(F^ 1 F2). Thus, the smaller 
p(F^ 1 F2) is, the faster the convergence we might expect of the iterative scheme. To this end, we 
have the following result for the selection of the parameter a. 

Proposition 6.15. Consider two alternative matrix splitting schemes with parameters a\ and cti, 
respectively, satisfying i < a\ < a.i- Let p ai and p a2 be their spectral radii, respectively. Then, 

Pcti Pct2 ■ 

Proposition 6.15 implies that in order to make the matrix splitting scheme converge faster, we 
should choose a smaller a, i.e., we can let a = |+e, where e > is small. The proof of Proposition 6.15 
makes use of the comparison theorem in [9] and we relegate its details to Appendix E. 

Next, we show that the matrix splitting scheme in Theorem 6.14 can indeed be implemented in a 
distributed fashion to solve the MRFC problem. For convenience, we define two types of link sets as 
follows: 

$(n) = 1 (n) UO(n), *(n, /) = {I £ I (n) U O (n) : Tx(/) = Dst(/) or Rx(Z) = Dst(/)} . 

We let 15(a) denote the set indicator function, which takes value 1 if a G S and otherwise. Then, 
we have the following result: 

Theorem 6.16. Given a primal solution y k , the update of the dual variable can be computed 

( f) 

using local information at each node. More specifically, w)t can be written as 

w(J\k + 1) = - J— (V^(k) + V^(k) - W£(k)), (36) 
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where Un (k), Vn'{k), and Wn ' (k) are, respectively, defined as 



rU) 



rU), 



U{P(k) 



Xf)\2 



Eze^n)! 1 + «(! - 1 *(n,/)(0)]( ; 
E ie<E , (n )[l + a(l-l*(n,/)(0)](^ (/) ) 2 ( 



(x< /} ) 2 



+ 



ll x ;|l 



1 IIX( 



.)+ 



+ 



i/n ^Src(/), 



i/n = Src(/), 



(37) 



i6X(n)\*(n,/) 



Rx ) (/)) + 



leO(n)\V(n,f) 
F 



((x 



l x n 



' 7 Tx(0 



/ rv(r (/) r (/,), > 2 \ 



(38) 



= E (( E 



wy)(fc) 



/'=l,^/ iex{n) 
„(/) 



{x\ f) x\ f) ? 
llxdl 2 



2^ 115,112 A W Rx(i) 



ZG0(n) 

r (/)->2 



(/') - 

Tx(I)- 



1 ~~ E/'=l -jj^ 
v (<[ v F ^ (/) ) 2 T cm 



F' } ) 



(39) 



ifn^Src(f), 



(40) 



v fi v F ^ (/) ) 2 T (/') 



+ '/n = Src(/). 



Theorem 6.16 can be proved by computing the element-wise expansion of (35). We relegate the 
proof details to Appendix F. 

Remark 3. There are several interesting remarks pertaining to Theorem 6.16. First, it can be seen 
from (37), (38), (39), and (40) that all the information needed to update Wn^ are either locally 
available at node n or at links that touch node n. This confirms that the matrix splitting scheme can 
be distributedly implemented. Second, suppose that a = 1; then it can be verified that V^ { involves 
a difference of quadratic terms of flow x\ coming into and going out of node n. This bears some 

resemblance to the dual update scheme in the subgradient method (cf. (10)). The key difference 

~(f) ~(f) 

is that all the quantities here are of second-order and are weighted by w txl — iv-j^y) > wmcn can be 
loosely interpreted as "back-pressure" (see Remark 1). Likewise, also involves a similar "back- 
pressure" weighting mechanism. But unlike , the second-order quantities in are related to 
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(ft (f) 

cross-session flow products x\ x\ . Third, although the dual update scheme within a second-order 
method is more complex at each node, the more rapid convergence rate of a second-order method, 
with its accompanying less information exchange, outweigh this local computational cost increase. 

6.5 Implementation of the Distributed Newton Method 

Although we have derived the main elements of a distributed computational scheme for obtaining the 
primal Newton direction and for updating the dual variables, which are key parts in our proposed 
distributed Newton method, there are a few open questions yet to be answered for practical imple- 
mentations. In what follows, we will discuss these issues, namely, the scale of information exchange, 
stopping criterion, step-size selection, etc. 

6.5.1 Information Exchange Scale Analysis 

We now analyze the required information exchange in our proposed distributed Newton method. We 
first consider the primal Newton direction update. From Theorem 6.7, we can see that to compute 
Asj, we need Sf and w^J^y Since Sf is available at Src(/), we can see from Theorem 6.16 that 
w Src(f) can a ^ so De computed at Src(/). Hence, there is no need for any information exchange in 
computing Asf. 

( f) f ~(f f ) ~( f f ) 

To compute Ax J , , we can see from (27) that we need x\ , / = 1, . . . ,F, w^J^y and w-^J^y 

f ~(f) 
Clearly, x\ , f = 1, . . . , F are already available at link /. From Theorem 6.16, we can see that w^, x ^ 

~(f) 

and w^J^ can also be computed using flow and dual information with respect to links that share 

( f) 

Tx(Z) and Rx(Z). This implies that computing Ax ; only requires exchanging information one-hop 
away from link I, as shown in Fig. 4. 

Next, consider the updating of dual variables. From Theorem 6.16, we can see that to compute 



-(/) 



Wn ' , we need \ w^J^y and w^Jyy where I G $(n), /' = 1, . . . , F. It is clear that xf \ I G $(n), 

~(f) ~(f) 

is readily available at node n. On the other hand, iw-px(z) ano - w Rx(i) are either available at node n 
itself or are available at nodes one-hop away from node n. This implies that computing w„ only 
requires exchanging information from nodes one-hop away from node n, as shown in Fig. 5. 

Two interesting remarks are in order. First, although the MRFC problem is more complex than 
the pure flow control problem in [13], the information exchange required for the distributed Newton 
algorithm for MRFC turns out to be more decentralized than that in [13]. More specifically, the 
information exchange for MRFC is from entities at most one-hop away, while in the pure flow control 
problem in [13], each source node needs to send information to all the links on its predefined route. 
This somewhat surprising result can be loosely explained by the fact that by allowing multi-path 
routing, the routing decision is automatically determined by the node "pressure" as described in 
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Figure 4: Information exchange for comput- Figure 5: Information exchange for com- 

(f) ~(f) 

ing Axf , which only requires exchanging in- puting w r l , which only requires exchanging 

formation from links one-hop away from link information from nodes one-hop away from 



Remark 1 at each node, thus alleviating the burden of exchanging information along the fixed routes. 
Second, we can see that our distributed Newton method requires a similar scale of information 
exchange to that in the subgradient method. 

6.5.2 Initialization of the Algorithm 

Another open question in the implementation is how to initialize the algorithm. One simple solution is 
as follows. Each Src(/) can choose an initial value e/ and equally distribute e/ along all of its outgoing 
links. Also, if each intermediate node has multiple outgoing links, then the sum of its incoming traffic 
will be equally distributed along each outgoing link as well. Clearly, if e/, V/, are small enough, then 
the constraint Ylf=i x i^ — &i can be satisfied. The initial values of dual variables can be chosen 
arbitrarily since they are unrestricted (e.g., a simple choice is to set wil^ = 1 if n ^ Dst(/) and 



6.5.3 Stopping Criterion 

Since the Newton method enjoys a quadratic rate of convergence, a simple stopping rule is to let all 
sources and links run the algorithm for a fixed amount of time. If the time duration is long enough 
for a given maximum sized network, then due to the rapid convergence speed, by the time the clock 
expires, it is with high probability that the algorithm will have converged to a very near-optimal 
solution. 

Another more sophisticated way to stop the algorithm can be based on the so-called Newton 
decrement [7]. In Newton methods, at a given primal vector y k , the Newton decrement is defined 



node n. 



if n = Dst(/)). 



as [7] 




(41) 
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which measures the decrease in the objective function value at each iteration. Thus, we can use 
A(y fe ) < e as a stopping criterion, where e is a predefined error tolerance. The following result shows 
that A(y fe ) can also be computed in a distributed fashion. Again, for ease of notation, we omit the 
iteration index k. 



Proposition 6.17. The Newton decrement A(y) can be computed as 

F -, x 1 

• (42) 



F L r F .(f) TP l 



m = ( v>»/) 2 ( - tu'SM + ^) + e e (^-) + I ( e a, 

v /=i v z=i L /=i x / /=i 

We remark that since (42) is separable with respect to each source node and each link, each 
source can compute the quantity (As/) 2 ^ — tU'J(sf) + j^jjz^ and each link can compute the quantity 



^ — tjy j + y ^2f = i AarJ J . Therefore, A(y fc ) can be computed distributedly using only local 

information. The proof of Proposition 6.17 is based on the decomposition structure of and we 
relegate the proof details to Appendix G. 

To compute the Newton decrement, we can see from (42) that each source needs Sf and Asj and 

(f) ( f) 

5, and Ax) . From earlier discussions on Asj, 



(f) A (f) 

each link needs x) and Ax, . From earlier discussions on Ast, we can conclude that no information 



(f) 

exchange is required at each source node. Also, from earlier discussions on Ax, f , we know that at 
most one-hop information exchange is required in this regard. However, to allow every source and 
link to compute the final value of the Newton decrement, every source and link will need to broadcast 
a packet containing the value of (Asf) 2 (^ — tU'j{sf) + j^jji^J and (~T7r) + 3[(S/=i^ x F^) > 
respectively, to the network. Thus, we can see that the more accurate termination time is obtained 
at the expense of a larger scale of information exchange across the network. 

6.5.4 Step-Size Selection 

As in the classical Newton method [6,7], when the iterates {y k } approach a close neighborhood of an 
optimal solution, using a fixed step size -ir k = 1 gives us the so-called quadratic rate of convergence, 
which is very efficient. If the iterates {y fc } is far from an optimal solution (which is also called 
"damped Newton phase" and can be measured by ||V/(y fc )||2 - see [7] for more details), then some 
inexact line search methods, such as the "Armijo rule" [6] (also called "backtracking line search" 
in [7]) or the step-size rule in [13] can be used. Due to the inexactness of these line search methods, 
the theoretical convergence rate would be sub-quadratic, but still in theory and practice, superlinear, 
and so, much faster than subgradient-type methods. 

To conclude this section, we summarize our distributed Newton method for the MRFC problem 
in Algorithm 1. 
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Algorithm 1 Distributed Newton Method for Solving MRFC 



Initialization: 

( f) 

1. Each source and link: Choose some appropriate values of Sf and x\ , , V/. 

~( f) 

2. Each node: Choose appropriate values of dual variables Wn , V/. 
Main Iteration: 

3. Update the primal Newton directions Asf and Axf using (26) and (27) at each source node 
and link, respectively. 

4. Update the dual variables u>„ using (36) at each node. 

5. Terminate the algorithm if some predefined running-time limit is reached or if the Newton decre- 
ment criterion is satisfied. Otherwise, go to Step 3. 




Figure 6: The error between the true solution of w fc in Eq. (23) and our matrix-splitting based 
iterative computation scheme. 

7 Numerical Results 

In this section, we present some pertinent numerical results for our proposed distributed Newton 
method. First, we examine the convergence speed of the parameterized matrix splitting scheme in 
Section 6.4. We use a 10-node 3-session network as an example. The initial values of w fc is set to all 
ones. We vary a from 0.55 to 1. The iterative scheme is stopped when the error between the true 
solution of w fe in Eq. (23) and our matrix-splitting based iterative computation scheme is less than 
1 x 10 6 . The error is shown in Fig. 6 (in log scale). We can see that for all values of a, the error 
decreases exponentially fast. Also, the smaller the value of a, the faster the convergence speed. More 
specifically, when a = 0.55, the number of iterations is approximately half of that when a = 1 (115 
vs. 232). This confirms our theoretical analysis in Proposition 6.15. 
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Figure 7: A five-node two- Figure 8: The optimal routing Figure 9: The optimal routing 
session network. solutions for session N4 — > Nl solutions for session N5 — > N3 

(inb/s/Hz). (inb/s/Hz). 

To show the details in our proposed distributed Newton method, we first study a five-node multi- 
hop wireless network as shown in Fig. 7. In this network, five nodes are distributed in a square 
region of 800m x 800m. The maximum power for each node is 100 mW. The path-loss index is set 
to 3.5. There are two sessions in the network: N4 to Nl and N5 to N3. We adopt log(s/) as our 
utility function, which represents the so-called proportional fairness [2]. The optimal routing paths 
for session N4 — > Nl and N5 — > N3 are plotted in Fig. 8 and Fig. 9, respectively. In Fig. 8 and Fig. 9, 
the marker and the ■ marker denote the source node and the destination node of each session, 
respectively. The optimal session rates for N4 — > Nl and N5 — > N3 are 9.31334 and 10.4670 b/s/Hz, 
respectively. The convergence behavior of the distributed Newton method is illustrated in Fig. 10, 
which shows the objective values of the approximating and the original problems. It can be seen that 
our proposed algorithm only takes 45 Newton steps to converge, which is very efficient. 

To further illustrate the advantage of our proposed algorithm over first-order approaches, we 
randomly generate 50 network examples with 30 nodes and six sessions. We compare the number of 
iterations for our proposed algorithm and the subgradient algorithm, and the results are shown in 
Fig. 11. For these 50 examples, the mean numbers of iterations for our distributed Newton method 
and the subgradient method are 779.3 and 61115.26. 
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Figure 10: Convergence behavior of the proposed distributed Newton algorithm for the five-node 
network example. 
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Figure 11: Convergence speed comparison between our proposed algorithm and the subgradient 
algorithm over 50 randomly generated network examples. 

8 Conclusion 

A Proof of Theorem 6.6 

First, note from (24) that X/ can be decomposed into a diagonal matrix with a rank-one update as 
follows: 

X / = D + ll-l T , 
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where D is defined as D = Diag j(a^) 2 , . . . , (x\ F ^) 2 j. Now, let u = -^-1 and v = 1. Then, using 
the Sherman-Morrison- Woodbury formula [22], we have that 

X,-' = D- - D '' U ^ D ;' . (43) 

Since D is diagonal, we have D 1 = Diag |(x|^) 2 , . . . , (a;| F ^) 2 |. The denominator of the second 
term in (43) can thus be computed as follows: 

1 + v T D _1 u 

= l + ^Diag{(x«) 2 ,...,(^) 2 }l 

V F (x {f) ) 2 

= 1 + } ■ (44) 

The numerator of the second term in (43) can be computed as 

D-WD- 1 = (45) 
°l 

where the entries of Q are 

\x\ h) f, if 1</1 = /2<F, 



(Q)/l/2 



(x; /i} x[ /2) ) 2 , ifi</i,/ 2 <F,/i^/ 2 . 



From (44) and (45), we obtain that 

D 1 uv T D 1 Q 



x. 



|2 



Hence, the diagonal entries of X^ 1 can be computed as 

„(/i)\ 4 



cv-i\ _ (jf)\2 y^i j 

fl) ( F} T 

Now, define a vector x; = [x\ , , . . . , x\ , 5{\ . Then, the right-hand-side of (46) can be further 
written as 



On the other hand, the off-diagonal entries of can be computed as 

' ^ + Yf f =M } ) 2 INI 2 

Combining (47) and (48), we obtain (25). This completes the proof. 
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B Proof of Theorem 6.7 



First, note that 



M T w fc 



(b«) 5 



-(ai 1} ) T 




w 



(i) 



w 



(F) 



(b( 1 )) T w( 1 ) 



(b( F )) T w( F ) 




(fhT~(F) 




-'Rx(L) 



Tx(L) 



where the last equality holds due to the special structure of and af} F \ More specifically, notice 
that b^) is simply a unit vector where all entries are zeros except for a "1" at the entry corresponding 
to the node Src(/). Thus, we have (b^) T w^ = ^grc(/)" Likewise, has two non-zero entries (a 
"1" corresponding to node Tx(Z) and a "—1" corresponding to node Rx(Z)) or only one non-zero entry 
when one end point of link Z happens to be Dst(/). Thus, we have — (a^) T w^ = — 
(recall that we have defined w^ t ^ = 0). Hence, 



(V/(y fe ) + M T w«) 



-tU' f (sf) - j- + 



■ w sl(f) 
(/) _ ? 75(/) 

7 Rx(0 W Tx(Z) 



if 1 < i = f < F, 
Xi = (l + l)F + f. 



Recall that H^ 1 = Diag {S" 1 , X^ 1 , . . . , X^ 1 }. Thus, we have that Ay k = -H" 1 (V/(y fe ) + 
M T w( fc )) can be partitioned into L + 1 F-dimension vectors. For the entries in the first vector, since 
S _1 is diagonal, we have 



(Ay ^ = s j^ SfU f {Sf) + 1 " s f w( £(f) 



1 - tsjU'^Sf) 

The remaining of the F-dimensional vectors are of the form 



X 



-i 



Si 



(i) + W Rx(0 



1 + 7» (F) 



Rx(0 



w, 



w, 



(1) 

Tx(0 

(F) 
Tx(0 



if 1 < i = f < F. 



1 = 1,. 



,L. 



(49) 
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Using the structural result of X ; 1 in (24), we have that the i-th entry of Ay fe , where i = (l + l)F + f, 
can be computed as 



(Ay fc ), = (x?>) 



(/)\2 



' lJj ff 



1 - - i + w { J} rn - w[ f) 



X 



(/) Si Tx « fe « 



+ 



/'=!,/¥/ 



(50) 



Note that (49) and (50) are the same as (26) and (27), respectively. The proof is complete. 



C Proof of Lemma 6.8 

First, consider the diagonal entries in D/. Note that D/ = ^-b^ f \b^) T + Yli=i(,x^) 2 aS l f \a^) T . 
From Lemma 6.2, the i-th diagonal entry in a}"(a}") T is equal to 1 if the corresponding node of 
the i-th entry, say n, is either Tx(Z) or Rx(Z). Thus, when summing over all I, the number of ones is 
precisely given by the number of links that have node n either as its transmitting node or receiving 
node, i.e., the links that are in either O (n) and T{n). Thus, we have (^i = \{x\^) 2 ^/^ (d/^) T )ii = 
^2lei(n)uO(n)( x [^) 2 • Also, from Lemma 6.1, we have that the i-th diagonal entry is equal to 1 if 
n = Src(/). Hence, we have 



^2leO(n)ul(n)( x \ : ) 2 + ~t ^ row * corresponds to node n and n = Src(/), 
EieO(n)ui(n)( x( i f) ) 2 otherwise, 



which is the same expression as in Lemma 6.8. 

Next, consider the off-diagonal entries in Dj. Again, from Lemma 6.2, we know that the (i,j)-th 
entry in a^(a^) T is equal to —1 if the corresponding nodes of the (i,j)-th entry, say n\ and ri2, 
are Tx(Z) and Rx(Z), or vice versa. Thus, when summing over all I, the number of —1 entries is 
precisely given by the number of links that have nodes n\ and n<i either as their transmitting node 
and receiving node, i.e., the links that are in Y{ri\^n-i). Hence, we have 



(D/) 



~ J2ier(m n 2 )( x \^) 2 ^ row * an< ^ c °l umn j correspond to two connected nodes n\ and ri2, 



otherwise, 
which is the same expression as in Lemma 6.8, and the proof is complete. 
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D Proof of Theorem 6.14 

First, note that MH^ 1 M T y because f(y) is convex. Hence, (Afc + aft) — (aft — fl k ) is positive 
definite. Next, we check the positive definiteness of (Afc + aft) + (aft — ft k ). Note that 

(A* + aft) + (aH - fl k ) = A k + 2aTt k - fl k . (51) 

From the definition of Afc, Lemma 6.8, and Lemma 6.9, we have that all diagonal entries in Afc are 
positive. Hence, A^ y 0. On the other hand, by the definitions of ft k and ft k , we have that the 
entries of each row in 2aft k — ft k satisfy 

(2aft k - fl k ) u - l(2«n fc - ft k )ij\ 
= (2a-l)Y\(ft k ) ij \>0, fora>i. 

Also, it is clear from the definitions of ft k and ft k that (2aft k — ft k )u > 0. Thus, 2aft k — ft k is 
diagonally dominant and hence positive definite. Therefore, Afc + 2aft k — ft k is also positive definite, 
and the proof is complete. 

E Proof of Proposition 6.15 

To establish Proposition 6.15, we need the following result [9, Theorem 2.3]: 

Lemma E.l. Let A = Mi — Ni = M 2 - N 2 be two splittings of A, where A -1 y 0, M^f 1 y 0, and 
M^ 2 y 0. IfM^ 1 y MJ 1 , then ^(M^Ni) < p(M;^N 2 ). 

Now, for i < a\ < a 2 , since A k + aft k is diagonal, we have that 

(A fe + a 2 ft k )ii - (Afc + aift k )u 

= (a 2 - a^ WkU > 0. (52) 

Also, since ((Afc + aft k )~ 1 )u = l/(A k + aft k )u (from the diagonal property again), Eq. (52) implies 
that (Afc + aift k )~ l y (A k + a 2 fifc) _1 . Thus, Proposition 6.15 simply follows from Lemma E.l, and 
the proof is complete. 

F Proof of Theorem 6.16 

To show (36), we need to compute the element-wise expansion of (35). First, note that (Afc + aft k ) 
is diagonal, and so its inverse can be easily computed by taking the inverse of each diagonal entry. 
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Therefore, we start by computing each diagonal entry in (A& + aflk)- To this end, we first define an 
index function /3/(n), n / Dst(/), as follows: 



n if n < Dst(/), 
n-1 ifn>Dst(/). 



(53) 



Since contains the main diagonal of MH fc M , from Theorem 6.11, we obtain that 



(A 



kjii 



'ew^^i - w) + -^ ( ,; )+r v if n = Src(/) ' 



S$(n)( X | /) ) 2 ( 1 



ll x i 



(54) 



if n ^ Src(/), 



where the index z satisfies i = (/ — l)(N — 1) + (3f(n). Notice that each diagonal entry in fl^ is the 
row sum of non-diagonal entries in MH^M 7 " . Thus, from Theorem 6.11, we obtain that 



®..= E (*l fl > 2 (i-M)+ E E 

;€$(n)\*(n,/) 11 



)%\\ 2 



(55) 



/'=i,^//e*(n,/') 

Therefore, using the indicator function l*( n ,/) and adding (54) and (55), we obtain that 



(A fe + aft k ) 



Ei 6 »(„)[l + «(1 " l»(n,/)(0)](x, (/) ) 2 (l " + 
Ef =i I#/ ( Ei 6 ^/') Q( 1^ ))2 ) if n ^ Src(/), 



Ei 6 *(„)[l + a(l-l*(n,/)(0)](^ (/) ) 2 (l 

,(/)„(/'K2> 



+ 



EF ( v-^ a(x, u; x, u ; ) 2 ^ i 1 q / j"i 



( f) 

which is the same as the definition of U%'(k) in (37). 

Next, consider the entries in (afifc — fifc)w fe . Recall from Theorem 6.11 that the matrix MH^M 7 
has a partitioned matrix structure. Thus, the vector (ottok — f2&)w fc can be partitioned into F blocks, 
where each block is of the form 



((aftfc - ri fc )w fc ) / = -R / w / + ^ G //' w 

f'=i,*f 



(/') 



/ = 1 



, . . . , j. , 



(56) 



where R/ is obtained by replacing the main diagonal of Dj — Dj with the corresponding entries in 
— afifc. Hence, by computing the entries in — Hfw* and noting the special structure in Rj, where it 



36 



only contains entries 1, —1, and 0, we have 



(/") 2 



E (^) 2 (i-^)(^ (i) — 



(/) > 

Tx(i)' 



i60(n)\*(n,/) 

F / rv(r (/) r (/,) l 2 N 



f f) 

which is the same as the definition of V^\ (k) in (38). Similarly, by computing the entries in 
X/f =i ^/ Gff/-w(f'\ we have 



S G //' w 

/'=!,#/ 



(/') 



E (( E E ^^k 1 )^,, -*©,>• 



which is the same as the definition of V^(k) in (39). 

Finally, consider the term MH^V f(y k ). Note that MH^V/ (y fc ) can be decomposed into 



MH^V/(y*) = BS- l V s f(y k ) + £ -A l X^ 1 V xl f(y k ), 



i=i 



where s = [si, . . . , sf] t and x z = [x^\ . . . , x^] T . Accordingly, consider first the term BS 1 V s /(y fe ). 
Using the diagonal structure of B and S, it is easy to obtain that 



(BS-'V./Cy*))!/' 



o 



otherwise. 



Recalling that H fc 1 can be decomposed into a diagonal matrix and a rank-one update matrix, we 
have 



-A,X 



^/(y*) = -A.Diag {(x z (1) ) 2 , • • • , ri F) ) 2 } 



l|x/|P 



(1)n4 



00 



J_ 1 



J_ _ 1 



(l)J^))2 



(x] >x i 



( *r } ) 4 



+ 



S, „m 



Si TP 
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Hence, computing each term in the above decomposition, then adding BS 1 V s /(y fc ), and then 
summing over all /, we obtain that 



(MH-'V/ty*)) 



n 



Jf) 
±1 

Si 



E 



v n v F ^ (/) ) 2 T (/') 



^ (4 f) ) 2 jf) 



/'=i "pip" z 

(/)x2 



/ZgJ(n) 



if n ^ Src(/), 



which is the same as the definition of Wn \k) as in (40). Thus, the result in (36) simply follows from 
Theorem 6.14, and the proof is complete. 



G Proof of Proposition 6.17 



Define the following two vectors: As = [Asi, . . . , A.sf] t and Axj = [x^\ . . . , x^] T . Also, from the 
decomposable structure of we have 



(Ay) T H fc Ay = (As) T SAs + ^(Ax,) T X, Ax,. 



(57) 



l=i 



Now, consider first (As) T SAs, which, due to the diagonal structure of H^, can be simply computed 



as 



(Asf SAs = f> S/ ) 2 ( - tU'J(s f ) + J^) . 



/=i 



(58) 



Next, we consider ^^ 1 (Ax,) T X,Ax;. Recall that X; can be further decomposed into a diagonal 
matrix plus a rank-one update matrix. Thus, we have 

L 



^(Ax ; ) T X ; Ax, 



=i 



£ [ 4 } 



x 



(F) 



\ 



fe (1) ) 2 



W 7 ) 







r r (l) i 


1 rp 
+ -1-1 T 






Ol 








JF) 


) 




. X l 



L r F /At (/) n2 i F 



Jf) J + Sl 



(59) 



1=1 L /=l f=l 

Thus, adding (58) and (59) gives the desired result in Proposition 6.17, and the proof is complete. 
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